Automated Information Extraction using Amorphic
نویسنده
چکیده
The Amorphic system is an adaptive web information extraction scheme for building intelligent systems for mining information from web pages. It can locate data of interest based on domain-knowledge or page structure, can automatically generate a wrapper for an information source, and can detect when the structure of a web-based resource has changed and act on this knowledge to search the updated resource to locate the desired information. This allows Amorphic to adapt to changing structures of websites allowing users to manage their information extraction more effectively. Five different example implementations are described to illustrate the need for information extraction systems capable of extracting information from semi-structured web documents. They demonstrate the versatility of the system, showing how a system, like Amorphic, can be used in systematic data extraction applications that require data collection to be conducted over an extended period of time. The current Amorphic system represents a cost-effective approach to developing large-scale adaptable information extraction systems for a variety of domains.
منابع مشابه
Laser Raman Studies of Polycrystalline and Amorphic Diamond Films
This report describes the results of a number of different, but related, laser Raman studies on various CVD diamond samples. It attcmpts to show the versatility of Raman spectroscopy as a diagnostic tool for the quality of diamond films, by demonstrating its use in a few novel applications. The studies of the laser Raman spectra of amorphic diamond and CVD diamond films are performed using lase...
متن کاملN ov 2 00 5 On amorphic C - algebras
An amorphic association scheme has the property that any of its fusion is also an association scheme. In this paper we generalize the property to be amorphic to an arbitrary C-algebra and prove that any amorphic C-algebra is determined up to isomorphism by the multiset of its degrees and an additional integer equal ±1. Moreover, we show that any amorphic C-algebra with rational structure consta...
متن کاملAutomated Data Extraction from Online Social Network Profiles: Unique Ethical Challenges for Researchers
As the use of online social networking (OSN) sites is increasing, data extraction from OSN profiles is providing researchers with a rich source of data. Data extraction is divided into non-automated and automated approaches. However, researchers face a variety of ethical challenges especially using automated data extraction approaches. In social networking, there has been a lack of research tha...
متن کامل6 On amorphic C - algebras
An amorphic association scheme has the property that any of its fusion is also an association scheme. In this paper we generalize the property to be amorphic to an arbitrary C-algebra and prove that any amorphic C-algebra is determined up to isomorphism by the multiset of its degrees and an additional integer equal ±1. Moreover, we show that any amorphic C-algebra with rational structure consta...
متن کاملDNA profiling from heroin street dose packages.
A large amount of heroin street doses are seized and examined for drug content by the Israel police. These are generally wrapped in heat-sealed plastic. Occasionally it is possible to visualize latent fingerprints on the plastic wrap itself, but the small size of the plastic item and the sealing process makes the success rate very low. In this study, the possibility of extracting and profiling ...
متن کامل